JH Thought Lab

Podscribe: System Design

First published: 2020-06-20. Last updated: 2020-06-21.

This post describes the technical design of Podscribe. For the rational and vision for Podscribe, see https://site.j-henderson.com/blog/Podscribe:%20Vision.

GitHub repo is here: https://github.com/jrhender/Podscribe

Overal Architecture

My planned architecture for Podscribe consists of the following components. I am planning on using Google Cloud for much of it due to their "always free usage limits" being relatively generous.

High level system design of Podscribe

Components

Web frontend

An SPA, probably React. In addition to the UI and calling services, the web front end will handle downloading and hashing podcast files.

API Gateway

API Gateway to handle auth, logging, monitoring. I am planning on using API Endpoints.

Transcription API

Currently planning on simple REST service. This may need to be modified, depending on how long the actual transcription takes. Possible future extensions could include an API to list all available transcripts for a given podcast or the ability to stitch together sections of a podcast.

Transcribe

transcribe(audioFile, fileHash, podcastName=None, podcastEpisode=None, startTime=None, endTime=None) : Transcribes an audio file and stores the resulting transcript. If the file has already been transcribed, return the transcript directly. This endpoint will call the external Speech-to-Text API (which is relatively costly 💸) and so only authenticated users (possibly with valid payment details) can use.

Parameters:

audio file (binary): The audio file to transcribe. Encoded as application/x-www-form-urlencoded or multipart/form-data
fileHash (string): The hash of the file to use as the key for transcript storage
podcastName (string): Optional
podcastEpisode (string): Optional
startTime (string): Optional. The start time within the episode of the file in question. Could be useful when retrieve a partial transcript and for stitching transcripts together.
endTime (string): Optional. The end time within the episode of the file in question.

Returns:

(string) Transcript of audio file.

GetTranscript

getTranscript(fileHash) OR getTranscript(podcastName, podcastEpisode): Retrieves a transcript. This API can be made publicly accessible.

Parameters:

fileHash (string): The hash of the file to use as the key for transcript storage. This endpoint
podcastName (string): Name of podcast
podcastEpisode (string): Episode of podcast

Returns:

(string) Transcript if available or a "not yet transcribed" message.

Transcription Storage

I'm planning on using a NoSQL database of sort, probably a document database. Maybe MongoDB, maybe FireStore. Each transcript and associated metadata will be a document.

Podcast Search API

Used to find podcast RSS feed and download information.

getRSS

getRSS(podcastName): Retrieves RSS. This API can be made publicly accessible. There almost certainly already an API for this somewhere (maybe ListenNotes.com or Apple Lookup API)

Parameters:

podcastName (string): Name of podcast

Returns:

(string) RSS feed URL

Auth Service

Probably don't need this until it gets to being more than me using PodScribe, but will probably use Auth0. Auth0 is supported by Google Cloud Endpoints.